Using Hints to Improve Inline Block-layer Deduplication

نویسندگان

Sonam Mandal

Geoffrey H. Kuenning

Dongju Ok

Varun Shastry

Philip Shilane

Sun Zhen

Vasily Tarasov

Erez Zadok

چکیده

Block-layer data deduplication allows file systems and applications to reap the benefits of deduplication without requiring per-system or per-application modifications. However, important information about data context (e.g., data vs. metadata writes) is lost at the block layer. Passing such context to the block layer can help improve deduplication performance and reliability. We implemented a hinting interface in an open-source block-layer deduplication system, dmdedup, that passes relevant context to the block layer, and evaluated two hints, NODEDUP and PREFETCH. To allow upper storage layers to pass hints based on the available context, we modified the VFS and file system layers to expose a hinting interface to user applications. We show that passing the NODEDUP hint speeds up applications by up to 5.3× on modern machines because the overhead of deduplication is avoided when it is unlikely to be beneficial. We also show that the PREFETCH hint accelerates applications up to 1.8× by caching hashes for data that is likely to be accessed soon.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Design and Implementation of an Open-Source Deduplication Platform for Research A RESEARCH PROFICIENCY EXAM PRESENTED BY

Data deduplication is a technique used to improve storage utilization by eliminating duplicate data. Duplicate data blocks are not stored and instead a reference to the original data block is updated. Unique data chunks are identified using techniques such as hashing, and an index of all the existing chunks is maintained. When new data blocks are written, they are hashed and compared to the has...

متن کامل

Sparse Indexing: Large Scale, Inline Deduplication Using Sampling and Locality

متن کامل

HPDedup: A Hybrid Prioritized Data Deduplication Mechanism for Primary Storage in the Cloud

Eliminating duplicate data in primary storage of clouds increases the cost-efficiency of cloud service providers as well as reduces the cost of users for using cloud services. Most existing primary deduplication techniques either use inline caching to exploit locality in primary workloads or use postprocessing deduplication running in system idle time to avoid the negative impact on I/O perform...

متن کامل

A Context Aware Block Layer: The Case for Block Layer Deduplication

of the Thesis A Context Aware Block Layer: The Case for Block Layer Deduplication

متن کامل

ChunkStash: Speeding Up Inline Storage Deduplication Using Flash Memory

Storage deduplication has received recent interest in the research community. In scenarios where the backup process has to complete within short time windows, inline deduplication can help to achieve higher backup throughput. In such systems, the method of identifying duplicate data, using disk-based indexes on chunk hashes, can create throughput bottlenecks due to disk I/Os involved in index l...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

Using Hints to Improve Inline Block-layer Deduplication

نویسندگان

چکیده

منابع مشابه

Design and Implementation of an Open-Source Deduplication Platform for Research A RESEARCH PROFICIENCY EXAM PRESENTED BY

Sparse Indexing: Large Scale, Inline Deduplication Using Sampling and Locality

HPDedup: A Hybrid Prioritized Data Deduplication Mechanism for Primary Storage in the Cloud

A Context Aware Block Layer: The Case for Block Layer Deduplication

ChunkStash: Speeding Up Inline Storage Deduplication Using Flash Memory

عنوان ژورنال:

اشتراک گذاری